Integrated Bioinformatics Analysis for Identifying the Significant Genes as Poor Prognostic Markers in Gastric Adenocarcinoma

Gastric adenocarcinoma (GAC) is the most common histological type of gastric cancer and imposes a considerable health burden globally. The purpose of this study was to identify significant genes and key pathways participated in the initiation and progression of GAC. Four datasets (GSE13911, GSE19826, GSE54129, and GSE79973) including 171 GAC and 77 normal tissues from Gene Expression Omnibus (GEO) database were collected and analyzed. Through integrated bioinformatics analysis, we obtained 69 commonly differentially expressed genes (DEGs) among the four datasets, including 20 upregulated and 49 downregulated genes. The prime module in protein-protein interaction network of DEGs, including ADAMTS2, COL10A1, COL1A1, COL1A2, COL8A1, BGN, and SPP1, was enriched in protein digestion and absorption, ECM-receptor interaction, focal adhesion, PI3K-Akt signaling pathway, and amoebiasis. Furthermore, expression and survival analysis found that all seven hub genes were highly expressed in GAC tissues and 6 of them (except for SPP1) were able to predict poor prognosis of GAC. Finally, we verified the 6 high-expressed hub genes in GAC tissues via immunohistochemistry, Western blot, and RNA quantification analysis. Altogether, we identified six significantly upregulated DEGs as poor prognostic markers in GAC based on integrated bioinformatical methods, which could be potential molecular markers and therapeutic targets for GAC patients.


Introduction
Gastric adenocarcinoma (GAC), the predominant histological type of gastric cancer, is the fifth most common cancer ranked after lung, breast, colorectal, and prostate cancers [1,2]. GAC, also known as stomach adenocarcinoma (STAD), has increased more than 1,000,000 new cases and led to deaths of more than 768,000 people worldwide in 2020 [3]. Although improvements in endoscopic, surgical, and systemic treatments have been made for decades, the mortality rate of GAC is still high and the global 5-year survival rates remain unsatisfactory [1,4]. us, GAC still imposes a considerable health burden globally.
Although the global 5-year survival rates are relatively low, the rates in Japan and South Korea are far more optimistic [5,6], owing to early detection and screening efforts in these Asian countries [7]. Furthermore, it is reported that the 5-year survival rate of early-stage T1 GAC (according to the TNM classification of malignant tumors) is ∼95%, while advanced-stage GAC (which cannot be surgically treated) has a median survival of ∼9-10 months [8,9], which further emphasizes the critical importance of early detection and diagnosis.
Molecular markers are vital for early detection of cancer [10][11][12]. To date, several biomarkers have been used for the diagnosis and determination of the clinical stage of GAC. Among them, carcinoembryonic antigen (CEA), carbohydrate antigen 19-9 (CA19-9), and erb-b2 receptor tyrosine kinase 2 (HER2) are the most frequently used biomarkers for GAC in clinical setting [13,14]. However, due to the insufficient specificity and sensitivity of the current markers, novel specific and sensitive molecular markers are still on urgent demand, especially in the field of early diagnosis and prognosis [13][14][15]. Bioinformatics analysis is a powerful and comprehensive tool for analyzing gene expression data from multiple datasets, which is perfect for excavating the potential molecular markers laid in Gene Expression Omnibus (GEO) and e Cancer Genome Atlas (TCGA) datasets. erefore, in the current study, we mainly focused on exploring the commonly differential expressed genes among different GEO datasets. Gene ontology (GO) and KEGG enrichment analysis were further conducted to identify the hub genes and key pathways enriched in the commonly DEGs. Protein-protein interaction (PPI) network of the DEGs was constructed, and core genes were determined via the Cytoscape Molecular Complex Detection (MCODE). In addition, DAVID, GEPIA, and Kaplan-Meier plotter were applied to re-analyze the expression and survival information of the core genes, respectively [16,17]. Finally, immunohistochemistry, Western blot, and RNA quantification analysis were performed to validate the expressions of the identified genes in GAC tissue samples.

Microarray Data Information.
NCBI-GEO is a free public database and provides us with gene expression profile of numerous cancers. e following criteria were used to screen the datasets and ensure relevant data were recorded: (I) the sample includes gastric adenocarcinoma and normal tissues; (II) the study type is expression profiling by array; (III) the species is limited to Homo sapiens; (IV) access to raw data is allowed. We obtained the gene expression profiles of GSE13911, GSE19826, GSE54129, and GSE79973 in gastric adenocarcinoma and paired normal tissues. Microarray data of GSE13911, GSE19826, GSE54129, and GSE79973 were all on account of GPL570 platforms ([HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array) which included a total of 171 GAC tissues and 77 normal tissues.

DEGs Identification.
Background correction and normalization were conducted through robust multi-array average (RMA) and Microarray Suite (MAS) approach. e GEO2R online tools [18] were used to identify DEGs between the GAC specimen and normal specimen with | log 2 FC| > 2 and an adjusted P value <0.05 [16][17][18]. en, the raw data were analyzed in Venn software online to identify the commonly DEGs among the original four datasets. e DEGs with log 2 FC > 0 were considered as upregulated genes, while the DEGs with log 2 FC ＜ 0 were considered as downregulated genes [16,17].

GO and Pathway Enrichment Analysis.
e functions and pathways enrichment of candidate DEGs were analyzed using DAVID (the Database for Annotation, Visualization and Integrated Discovery, https://david.ncifcrf.gov/) [19], which is an online bioinformatic tool with the function of integrating the GO and pathway enrichment analysis [20,21].
rough DAVID, we identified the unique biological properties of the commonly DEGs and visualized the DEGs enrichment of molecular function (MF), cellular component (CC), biological process (BP), and Kyoto Encyclopedia of Gene and Genome (KEGG) pathways (P < 0.05) [16,17].

Integration of PPI Network and Modular Analysis.
STRING (Search Tool for the Retrieval of Interacting Genes, https://cn.string-db.org/) online tool [22] was used to evaluate the PPI information of DEGs. en, the STRING app in Cytoscape [23] was employed to examine the potential correlation between these DEGs (maximum number of interactors � 0 and confidence score ≥0.4). In addition, the MCODE app in Cytoscape was used to identify the modules and hub genes of the PPI network (degree cutoff � 2, max. depth � 100, k-core � 2, and node score cutoff � 0.2) [16,17]. PPI network properties, such as node degree and betweenness centrality, were visualized by shape size and label font size, respectively.

Survival and RNA Sequencing Expression Analysis.
Kaplan-Meier plotter is a common website tool (https:// kmplot.com/), which contains considerable information among several cancers, including breast and gastric cancer [24]. e survival analysis was conducted by Kaplan-Meier plotter, and the log rank P value and hazard ratio (HR) with 95% confidence intervals were computed and shown on the plot. To validate these DEGs with significant expression pattern, we applied the GEPIA (Gene Expression Profiling Interactive Analysis, https://gepia.cancer-pku.cn/) website to analyze the data of RNA sequencing expression based on the GTEx projects and TCGA datasets [25].

Western
Blot. GAC and adjacent normal tissue samples were grinded and lysed with RIPA buffer supplemented with protease inhibitor cocktail. Protein concentrations of the extracts were measured with BCA assay. e Western blot analysis was performed according to the standard protocols using the above antibodies.

GO and KEGG Analysis of DEGs in Gastric
Adenocarcinoma. In order to examine the biological properties of the 69 DEGs, DAVID software was applied to conduct GO and KEGG analysis. Results of GO analysis indicated that (1) for biological processes (BP), upregulated DEGs were particularly enriched in endodermal cell differentiation, collagen fibril organization, protein heterotrimerization, skin morphogenesis, and cell adhesion, and downregulated DEGs in digestion, potassium ion import, oxidation-reduction process, and bicarbonate transport (Figure 2 Table 2, P < 0.05).

PPI Network and Modular Analysis of DEGs.
e DEGs PPI network complex was constructed via Cytoscape software. Results showed that 44 DEGs including 16 upregulated and 28 downregulated genes were enrolled, and 75 edges were formed (Figure 4(a)). ere were 25 DEGs which were not included into the DEGs PPI network. en, we applied Cytoscape MCODE to further analyze the prime module and ADAMTS2, COL10A1, COL1A1, COL1A2, COL8A1, BGN, and SPP1 were identified among the 44 nodes. Results also showed that the above seven hub nodes were all upregulated genes (Figure 4(b)).

Re-Analysis of Seven Hub Genes by KEGG Pathway
Enrichment. To further understand the possible enriched pathways of the seven hub DEGs, KEGG pathway enrichment was re-analyzed via DAVID. Results showed that seven core genes were significantly enriched in several cancerrelated pathways. In detail, COL1A2, COL1A1, and COL10A1 were enriched in protein digestion and absorption; COL1A2, COL1A1, and SPP1 were enriched in ECMreceptor interaction, focal adhesion, and PI3K-Akt signaling pathway; COL1A2 and COL1A1 were further enriched in amoebiasis ( Figure 5 and Table 4, P < 0.05).

Analysis of Hub Genes via the GEPIA and Kaplan-Meier
Plotter. To further validate the significance of the seven central genes, GEPIA and Kaplan-Meier plotter online tools were utilized to identify the expression data and survival data, respectively. GEPIA expression data showed that all seven hub genes were highly expressed in GAC tissues compared to normal tissues ( Figure 6 and Table 5, P < 0.05). Kaplan-Meier plotter survival data showed that high expression of ADAMTS2, COL10A1, COL1A1, COL1A2, COL8A1, and BGN had a significantly worse survival probability, while high expression of SPP1 showed no effect on patient survival ( Figure 7 and Table 6, P < 0.05).  Journal of Oncology

Validation of the Expression Levels of Six Core Genes in GAC Patients.
Finally, we detected the expression levels of the above six genes in GAC specimens and adjacent normal specimens by immunohistochemistry (Figure 8(a)), Western blot (Figure 8(b)), and RNA quantification (Figure 8(c)) analysis. Results showed that ADAMTS2, COL10A1, COL1A1, COL1A2, COL8A1, and BGN were highly expressed in GAC tissues compared to adjacent normal tissues (Figure 8), consistent with the GEPIA expression data.

Discussion
Gastric adenocarcinoma is a lethal malignance cancer. In this study, we applied bioinformatical methods on the basis of four gene expression profile datasets to identify more useful prognostic molecular markers in GAC. A total of 171 GAC specimens and 77 normal specimens were enrolled. First, we revealed a total of 69 commonly DEGs via GEO2R and Venn software (|log 2 FC| ＞ 2 and adjust P value ＜0.05), including 20 upregulated and 49 downregulated DEGs.       Journal of Oncology Second, GO and KEGG pathway enrichment analysis showed that 20 upregulated genes enriched in endodermal cell differentiation, protein heterotrimerization, ECM-receptor interaction, focal adhesion, protein digestion and absorption, PI3K-Akt signaling pathway, amoebiasis, and platelet activation, while 49 downregulated genes enriched in digestion, potassium ion import, oxidation-reduction process, bicarbonate transport, inward rectifier potassium channel activity, hydrogen:potassium-exchanging ATPase activity, gastric acid secretion, retinol metabolism, and metabolic pathways (P < 0.05). ird, DEGs PPI network complex of 44 nodes and 75 edges was constructed via Cytoscape software and prime module analysis identified 7 hub genes (ADAMTS2, COL10A1, COL1A1, COL1A2, COL8A1, BGN, and SPP1), which were all upregulated genes and were significantly enriched in several cancer-related pathways. Furthermore, GEPIA analysis showed that all the seven hub genes were highly expressed in GAC tissues (P < 0.05). In addition, Kaplan-Meier plotter analysis showed that high expression of ADAMTS2, COL10A1,  Journal of Oncology 7 COL1A1, COL1A2, COL8A1, and BGN had a significantly worse survival probability (P < 0.05), while SPP1 had no significance (P > 0.05). Finally, the 6 highly expressed core genes were validated via immunohistochemistry, Western blot, and RNA quantification analysis in tissue samples. Altogether, we identified six significant upregulated genes as poor prognosis markers in gastric adenocarcinoma via bioinformatical analysis, which could be potential new molecular markers and effective targets for early detection and further research. e hub genes in the main module of the PPI network of the commonly DEGs are mainly associated with protein digestion and absorption, ECM-receptor interaction, focal adhesion, PI3K-Akt signaling pathway, and amoebiasis. e family of collagen genes (CLO10A1, COL1A1, COL1A2, etc.) is tightly clustered and participates in the above cancerrelated pathways. Furthermore, studies have demonstrated the close relation between collagen genes and gastric adenocarcinoma, including COL10A1, COL1A1, COL1A2, and COL8A1. What's more, it is well known that PI3K-Akt signaling pathway (COL1A2, COL1A1, etc.) plays a vital role in the cell cycle and is activated in various cancers, including GAC [26]. For ADAMTS2, a member of the ADAMTS family is a procollagen N-proteinase [27]. Researches have shown that ADAMTS2 participated in major biological pathways and human disorders [28], but the relation between ADAMTS2 and GAC has rarely been studied [27]. Furthermore, BGN, a key member of the small leucine-rich proteoglycan family, has been shown to participate in many cancers and is associated with poor prognosis in cancer  Figure 6: Expression level of the seven hub genes in gastric adenocarcinoma patients compared to healthy people. To further validate the expression level between GAC patients and normal people, seven genes were analyzed via GEPIA website. All seven genes were significantly highly expressed in GAC specimen compared to normal specimen ( * P < 0.05). Red color meant GAC tissues (n � 408), and grey color meant normal tissues (n � 211).   patients, including gastric adenocarcinoma [29]. e results and related studies have provided solid evidence to prove the relation between the hub genes along with the enriched pathways and GAC.
Expression and survival analysis have demonstrated that ADAMTS2, COL10A1, COL1A1, COL1A2, COL8A1, and BGN are all highly expressed in GAC and their high expression has a significantly worse survival. Previous studies have also showed that the abnormal expression level of the six hub genes could be indicators of the initiation, progression, and clinical outcome of GAC. Till now, little is known about the exact mechanism of the six genes in GAC initiation and progression. In our study, we have provided more helpful information and direction for the future study of GAC via integrated bioinformatical methods, which would be new perspective and clues for early detection and diagnosis of GAC.

Conclusion
Altogether, our bioinformatics analysis study identified six upregulated DEGs (ADAMTS2, COL10A1, COL1A1, COL1A2, COL8A1, and BGN) between gastric adenocarcinoma and normal tissues based on four different microarray datasets. Results showed that these six genes were poor prognostic markers, which may play key roles in the initiation and progression of GAC. ese data presented in this study may provide new perspectives and clues into the early detection and therapeutic targets of GAC. However, more experiments and details are needed to verify the prediction and underlying mechanisms in the near future.  ADAM metallopeptidase with thrombospondin type 1 motif 2 COL10A1:
Data Availability e dataset supporting our findings is available at the following website: https://www.ncbi.nlm.nih.gov/geo/. All data generated or analyzed during this study are available from the corresponding author upon reasonable request.

Ethical Approval
is study was approved by the ethics committee of the General Hospital of Western eater Command (AP-PROVAL NUMBER/ID: 2021ky141-1), and informed consent was exempted.

Conflicts of Interest
e authors have declared that there are no conflicts of interest.